Efficient querying and learning in probabilistic and temporal databases
نویسنده
چکیده
Probabilistic databases store, query, and manage large amounts of uncertain information. This thesis advances the state-of-the-art in probabilistic databases in three different ways: 1. We present a closed and complete data model for temporal probabilistic databases and analyze its complexity. Queries are posed via temporal deduction rules which induce lineage formulas capturing both time and uncertainty. 2. We devise a methodology for computing the top-k most probable query answers. It is based on first-order lineage formulas representing sets of answer candidates. Theoretically derived probability bounds on these formulas enable pruning low-probability answers. 3. We introduce the problem of learning tuple probabilities which allows updating and cleaning of probabilistic databases. We study its complexity, characterize its solutions, cast it into an optimization problem, and devise an approximation algorithm based on stochastic gradient descent. All of the above contributions support consistency constraints and are evaluated experimentally.
منابع مشابه
An Overview on Querying and Learning in Temporal Probabilistic Databases
Probabilistic databases store, query and manage large amounts of uncertain information in an efficient way. This paper summarizes my thesis which advances the state-of-the-art in probabilistic databases in three different ways: First, we present a closed and complete data model for temporal probabilistic databases. Queries are posed via temporal deduction rules which induce lineage formulas cap...
متن کاملQuerying Nested Historical Relations in Heterogeneous Databases Environment
We study schema integration problems for consolidating historical information from nested relational databases in heterogeneous databases environment. These nested relations are for supporting complex objects. In heterogeneous databases systems, probabilistic partial values have been used to resolve some schema integration problems. In this paper, we extend the concept of probabilistic partial ...
متن کاملAnalytics over Probabilistic Unmerged Duplicates
This paper introduces probabilistic databases with unmerged duplicates (DBud), i.e., databases containing probabilistic information about instances found to describe the same real-world objects. We discuss the need for efficiently querying such databases and for supporting practical query scenarios that require analytical or summarized information. We also sketch possible methodologies and tech...
متن کاملQuerying and Learning in Probabilistic Databases
Probabilistic Databases (PDBs) lie at the expressive intersection of databases, first-order logic, and probability theory. PDBs employ logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and...
متن کاملEffective Representation and Efficient Management of Indeterminate Dates
Management of indeterminate temporal expressions is useful in a wide range of applications, from designing and querying temporal databases to knowledge representation and reasoning in artificial intelligence. In this paper, we focus on the representation and management of indeterminate dates, corresponding to a common use of temporal indeterminacy which can be found in (historical) texts writte...
متن کامل